Optimization hardness and biological transients¶
Preamble¶
After opening this notebook, run the cells below before anything else. These will import libraries and define functions.
This notebook requires the following packages, which are pre-installed on Google Colab.
- Python 3
- NumPy
- SciPy
- Matplotlib
- numba (optional hardware acceleration)
## Preamble / required packages
import numpy as np
import requests
from IPython.display import Image
np.random.seed(0)
## Import local plotting functions and in-notebook display functions
import matplotlib.pyplot as plt
from IPython.display import Image, display
%matplotlib inline
## default first plot color is black
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=['k']) # Set default plot color to black
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=plt.cm.tab10.colors) # Reset to default colors
plt.rcParams['legend.frameon'] = False
## Disable warnings
import warnings
warnings.filterwarnings('ignore')
### Load remote Python files on which this notebook depends
# url = "https://raw.githubusercontent.com/williamgilpin/illotka/main/base.py"
# resp = requests.get(url)
# resp.raise_for_status()
# with open("base.py", "w") as f:
# f.write(resp.text)
# url = "https://raw.githubusercontent.com/williamgilpin/illotka/main/utils.py"
# resp = requests.get(url)
# resp.raise_for_status()
# with open("utils.py", "w") as f:
# f.write(resp.text)
from base import *
from utils import *
# Autoreload
%load_ext autoreload
%autoreload 2
print("Loading complete")
Loading complete
Motivation: revisiting stability vs. complexity¶
- May 1972: Near steady-state, the dynamics of ecological systems are locally linear,
$$ \frac{dN_i}{dt} = \sum_{j=1}^N A_{ij} N_j $$
where $\mathbf{N} = (N_1, N_2, \cdots, N_N)$ is the vector of species abundances, and $A_{ij}$ is the interaction coefficient between species $i$ and $j$.
We use units where $\mathbf{N}^* = \mathbf{0}$ is a steady state.
A null expectation for the stability of the steady state comes from random matrix theory. We sample
$$ A = Q - d\, I $$
where $Q_{ij} \sim \mathcal{N}(0,1)$ for $i \neq j$ and $d$ is a constant density-limitation that prevents divergence.
- The sign of the largest eigenvalue of $A$ tells us whether the ecosystem is stable.
ecosystem_sizes = np.arange(10, 100).astype(int)
all_max_eigenvalues = []
for n_species in ecosystem_sizes:
A = 0.1 * np.random.normal(size=(n_species, n_species))
np.fill_diagonal(A, -1) # set the density limitation to magnitude 1
eigvals = np.linalg.eigvals(A)
all_max_eigenvalues.append(np.max(eigvals.real))
plt.plot(ecosystem_sizes, all_max_eigenvalues, '.')
plt.xlabel("Number of species")
plt.ylabel("Largest eigenvalue of $A$")
Text(0, 0.5, 'Largest eigenvalue of $A$')
Larger matrices are more likely to be unstable.
Yet many large ecosystems are stable.
Many papers "resolve" May's paradox by sampling from structured distributions, altering the dynamics, etc.
Is steady state stability even relevant in high-dimensional systems?
The Random Lotka-Volterra model¶
We consider random ecosystems given by the generalized Lotka-Volterra equation,
$$ \frac{dN_i}{dt} = N_i \left( r_i + \sum_{j=1}^N A_{ij} N_j \right) $$
where $N_i$ is the population of species $i$, $r_i$ is the intrinsic growth rate of species $i$, and $A_{ij}$ is the interaction coefficient between species $i$ and $j$.
To study the behavior of this model, we consider the case where $r_i \sim \mathcal{N}(0,1)$, and the matrix $A \in \mathbb{R}^{N \times N}$ has the form
$$ A = Q - d\, I $$ where $I \in \mathbb{R}^{N \times N}$ is the identity matrix, $Q_{ij} \sim \mathcal{N}(0,1)$ and $d$ is a constant density-limitation that sets the carrying capacity of the system.
## Initialize the model
eq = GaussianLotkaVolterra(
n_species=200, # Set the number of species
d=12.5, # Set the density-limitation
# random_state=0, # Set the random seed
)
eq.A = np.random.normal(size=(eq.n_species, eq.n_species)) - 12.5 * np.identity(eq.n_species)
eq.r = np.random.normal(size=eq.n_species)
## Initial conditions and integration settings
ic = 0.4 * np.random.uniform(size=eq.n_species) # Random initial conditions
tmax = 20000 # Maximum integration duration
## Numerically integrate the model
t, y = eq.integrate(tmax, ic)
## Plot the results
plt.figure(figsize=(7, 4))
plt.semilogx(t, y, color="k", lw=1.5, alpha=0.3);
plt.xlabel("Time")
plt.ylabel("Species abundance")
plt.xlim(1e-2, np.max(t))
plt.ylim(0, None)
(0.0, 0.42075787312515656)
np.max(np.real(np.linalg.eigvals(eq.jac(0, y[-1]))))
np.float64(-0.00010395421886044594)
How many species survive?¶
- The random Lotka-Volterra model always has a fixed point in which some species coexist at constant abundance, and some species are excluded (abundance zero)
def compute_number_survive(y, tolerance=1e-12):
return np.sum(y > tolerance)
nonzero_remaining = compute_number_survive(y[-1])
print(f"{nonzero_remaining} species out of {eq.n_species} remain")
104 species out of 200 remain
We now repeat the experiment many times¶
n_replicates = 20
tmax = 1000
number_survive = []
for i in range(n_replicates):
progress_bar(i, n_replicates) # Print progress bar for loop
eq = GaussianLotkaVolterra(n_species=200, d=12.5, random_state=i)
eq.A = np.random.normal(size=(eq.n_species, eq.n_species)) - 12.5 * np.identity(eq.n_species)
# eq.A = eq.A * np.identity(n) # Wipe out all off-diagonal elements
eq.r = np.random.normal(size=eq.n_species)
t, y = eq.integrate(tmax, ic)
n_survive = compute_number_survive(y[-1])
number_survive.append(n_survive)
plt.hist(number_survive)
plt.xlabel("Number of surviving species");
plt.ylabel("Frequency");
[################### ]
print(np.mean(number_survive))
print(np.std(number_survive))
104.7 5.771481612203231
We notice that our observed values are very close to a $p=0.5$ binomial distribution with mean $N p$ and standard deviation $N p (1-p)$.
print(0.5 * eq.n_species)
print(np.sqrt(eq.n_species * 0.5 * (1 - 0.5)))
100.0 7.0710678118654755
Properties of the Random Lotka-Volterra model¶
Proven in Serván et al. 2018.
- The number of surviving species follows a binomial distribution with mean $N/2$.
- The steady-state solution is unique (global attractor).
- There always exists a minimum density-limitation $d$ that that leads to steady-state.
Main assumption: Interactions and growth rates are drawn from a symmetric distribution with finite variance.
An interesting property of the random Lotka-Volterra model emerges when we remove all interactions (off-diagonal elements of the interaction matrix $A$)
eq = GaussianLotkaVolterra(
n_species=200, # Set the number of species
d=12.5, # Set the density-limitation
random_state=0 # Set the random seed
)
eq.A = np.random.normal(size=(eq.n_species, eq.n_species)) - 12.5 * np.identity(eq.n_species)
# eq.A = eq.A * np.identity(eq.n_species) # Wipe out all off-diagonal elements
eq.r = np.random.normal(size=eq.n_species)
t, y = eq.integrate(tmax, ic)
## Set the integration parameters
tmax = 1000
ic = 0.4 * np.random.uniform(size=eq.n_species) # Random initial conditions
## Numerically integrate the model
t, y = eq.integrate(tmax, ic)
## Plot the results
n_survive = compute_number_survive(y[-1])
plt.semilogx(t, y)
plt.title(f"Number of surviving species: {n_survive}")
Text(0.5, 1.0, 'Number of surviving species: 109')
We can see that the distribution of surviving species still follows a binomial distribution.
We can think of interspecies interactions as rotating the final solution vector, while preserving its $L_0$ norm (the number of non-zero elements).
To see this effect, we can gradually increase the interaction strength from 0 to 1 and plot how the steady-state abundance of a given species changes.
all_terminal_states = []
all_number_survive = []
ic = 0.4 * np.random.uniform(size=eq.n_species) # Random initial conditions
interaction_strengths = np.linspace(0, 1, 20)
for interaction_strength in interaction_strengths:
eq = GaussianLotkaVolterra(
n_species=200, # Set the number of species
d=12.5, # Set the density-limitation
random_state=0 # Set the random seed
)
eq.A = interaction_strength * np.random.normal(size=(eq.n_species, eq.n_species)) - 12.5 * np.identity(eq.n_species)
eq.r = np.random.normal(size=eq.n_species)
t, y = eq.integrate(tmax, ic)
## Store the results
all_number_survive.append(compute_number_survive(y[-1]))
all_terminal_states.append(y[-1])
all_terminal_states = np.array(all_terminal_states)
plt.plot(interaction_strengths, all_terminal_states);
plt.xlabel("Interaction strength")
plt.ylabel("Steady-state abundance of a given species")
Text(0, 0.5, 'Steady-state abundance of a given species')
How do extreme values affect the dynamics?¶
The generalized normal distribution is parametrized by shape parameter $\beta$, which controls the tail thickness. It is given by
$$ p(x) = \frac{\beta}{2\sigma\Gamma(1/\beta)} e^{-\left(\frac{|x-\mu|}{\sigma}\right)^\beta} $$
- when $\beta < 1$, the distribution has fatter tails than the normal distribution.
- when $\beta > 1$, the distribution has lighter tails than the normal distribution.
We will still assume $\mu=0$, and so the results of Serván et al. 2018 should still apply.
a_gaussian = np.random.normal(size=(100, 100))
a_fat_tail = normal_generalized(size=(100, 100), beta=0.5)
a_light_tail = normal_generalized(size=(100, 100), beta=5)
plt.figure(figsize=(5, 5))
plt.hist(a_gaussian.flatten(), bins=100)
plt.xlim(-3, 3)
plt.xlabel("Interaction strength")
plt.ylabel("Frequency")
plt.figure(figsize=(5, 5))
plt.hist(a_fat_tail.flatten(), bins=100)
plt.xlim(-3, 3)
plt.xlabel("Interaction strength")
plt.ylabel("Frequency")
plt.figure(figsize=(5, 5))
plt.hist(a_light_tail.flatten(), bins=100)
plt.xlim(-3, 3)
plt.xlabel("Interaction strength")
plt.ylabel("Frequency")
Text(0, 0.5, 'Frequency')
We can see how the distribution's tail thickness affects the dynamics of the Lotka-Volterra model.
## Initialize the model and parameters
eq = GaussianLotkaVolterra(
n_species=300, # Number of species
random_state=0, # Random seed
tolerance=1e-10 # Precision floor for integration
)
d = 18 # Density-limitation
## Sample the interaction matrix from a normal distribution
eq.A = np.random.normal(size=(eq.n_species, eq.n_species)) - d * np.identity(eq.n_species)
eq.r = np.random.normal(size=eq.n_species)
## Sample the interaction matrix from a modified normal distribution
# eq.A = 1.5 * normal_generalized(size=(eq.n_species, eq.n_species)) - d * np.identity(eq.n_species)
# eq.r = normal_generalized(size=eq.n_species)
## Set the integration parameters
tmax = 10000 # Maximum integration duration
ic = 0.5 * np.random.uniform(size=eq.n_species) # Random initial conditions
## Numerically integrate the model
t, y = eq.integrate(tmax, ic)
## Plot the results
plt.figure(figsize=(7, 4))
plt.semilogx(t, y, color="k", lw=1.5, alpha=0.2);
plt.xlabel("Time")
plt.ylabel("Species abundance")
nonzero_remaining = compute_number_survive(y[-1])
plt.title(f"Number of surviving species: {nonzero_remaining} / {eq.n_species}")
plt.xlim(1e-2, 1000)
plt.ylim(0, 0.1)
(0.0, 0.1)
How long does it take to reach steady-state?¶
Given a steady-state solution $\mathbf{N}^*$, the settling time is the time it takes for the solution to reach the steady-state within a given tolerance $\xi$..
$$ \text{argmin}_{t} \bigg( \dfrac{\| \mathbf{N}(t) - \mathbf{N}^* \|}{\| \mathbf{N}(t) \|} < \xi \bigg) $$
## Initialize the model and parameters
eq = GaussianLotkaVolterra(
n_species=200, # Number of species
random_state=0, # Random seed
tolerance=1e-10 # Precision floor for integration
)
## Sample the interaction matrix from a modified normal distribution
eq.A = 1.5 * normal_generalized(size=(eq.n_species, eq.n_species), beta=0.1) - 18 * np.identity(eq.n_species)
eq.r = normal_generalized(size=eq.n_species, beta=0.1)
## Set the integration parameters
tmax = 1e7 # Maximum integration duration
ic = 0.5 * np.random.uniform(size=eq.n_species) # Random initial conditions
## Numerically integrate the model
t, y = eq.integrate(tmax, ic)
tind = compute_settling_time(y)
print(f"Settling time: {t[tind]:.2f}")
nonzero_remaining = np.sum(y[-1] > 1e-12) # Precision floor
print(f"{nonzero_remaining} species out of {eq.n_species} remain")
Settling time: 23225.38 105 species out of 200 remain
We can also get long transients by increasing the system size¶
- We rescale $r_i$ by $N$ and $A_{ij}$ by $\sqrt{N}$ to remove trivial scaling with $N$.
population_sizes = 5 * np.unique(np.logspace(0, 2, 15).astype(int))
all_settling_times = []
for n_species in population_sizes:
eq = GaussianLotkaVolterra(
n_species=n_species,
random_state=0,
tolerance=1e-10
)
ic = 0.1 * np.random.uniform(size=eq.n_species)
eq.A = (1 / np.sqrt(eq.n_species)) * np.random.normal(size=(eq.n_species, eq.n_species)) - 18 * np.identity(eq.n_species)
eq.r = (1 / eq.n_species) * np.random.normal(size=eq.n_species)
t, y = eq.integrate(tmax, ic)
tind = compute_settling_time(y)
print(f"Initial population size: {eq.n_species}, Settling time: {t[tind]:.2f}")
all_settling_times.append(t[tind])
Initial population size: 5, Settling time: 4238.32 Initial population size: 10, Settling time: 6667.35 Initial population size: 15, Settling time: 1722.09 Initial population size: 25, Settling time: 2097.47 Initial population size: 35, Settling time: 9678.69 Initial population size: 50, Settling time: 13406.45 Initial population size: 65, Settling time: 25458.02 Initial population size: 95, Settling time: 380297.58 Initial population size: 130, Settling time: 101534.37 Initial population size: 185, Settling time: 439534.86 Initial population size: 255, Settling time: 292105.65 Initial population size: 355, Settling time: 301577.37 Initial population size: 500, Settling time: 404375.74
plt.figure(figsize=(5, 5))
plt.loglog(population_sizes, all_settling_times, '.')
plt.xlabel("Population size")
plt.ylabel("Settling time")
Text(0, 0.5, 'Settling time')
Supertransients in high-dimensional nonlinear systems¶
The lifetime of transients can scale faster than linearly with the system size $N$.
Examples: gradient descent in high-dimensional optimization, chimera states in the Kuramoto model, and subcritical turbulence.
# Image("./resources/puffs.png", width=600)
# Image("./resources/scaling.png", width=600)
## Two images one above the other
display(Image("./resources/puffs.png", width=600))
display(Image("./resources/scaling.png", width=600))
Black traces show the mean lifetime of turbulent "puffs" in a pipe flow as a function of the Reynolds number. Colored traces show the length of time before puffs divide and spread. From Avila et al. 2011
Hierarchies produce transients in the Lotka-Volterra model¶
Trophic interactions can produce extremal values in the Lotka-Volterra model. We consider sampling a family of matrices of the form
$$ A = P^\top (A^{(0)} - d\, I) P + \epsilon \, E, $$
As before,$A^{(0)}_{ij}, r_i \sim \mathcal{N}(0, 1)$
$P \in \mathbb{R}^{N \times N}$ is a low-rank assigment matrix describing functional redundancy (species with similar interactions at the same trophic level).
$E \in \mathbb{R}^{N \times N}$ is a matrix of random noise that prevents two species from being exactly equivalent.
$\epsilon \ll 1$ is a small constant that controls the degree to which redundant species deviate from exact redundancy.
## Loads a figure from the resources folder
Image("./resources/fig_model.jpg", width=1200)
For an analysis of equilibrium properties of a Lotka-Volterra model with hierarchical structure and functional redundancy, see Tikhonov (2017).
m_trophs = 3
n_species = 6
tmax = 1e8
interactions_trophic = np.array([
[0, 0.5, 0],
[-0.5, 0, 0.3],
[0, -0.8, 0],
])*50
p_mat = np.array(
[[1, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 1]]
)
eq = RandomLotkaVolterra(n_species)
eq.r = np.array([0.8, 0.9, 1]) @ p_mat
eq.A = p_mat.T @ (interactions_trophic - 5*np.identity(m_trophs)) @ p_mat #+ 1e-6 * np.random.normal(size=(n_species, n_species))
ic = np.ones(n_species) * 0.1
t, y = eq.integrate(tmax, ic)
plt.semilogx(t, y);
plt.legend(np.arange(n_species), frameon=False);
plt.xlim(1e-3, t[-1]);
plt.xlabel("Time");
plt.ylabel("Species abundance");
Larger ecosystems¶
Instead of imposing a specific hierarchy, we'll sample the assignment matrix $P$ from a family of random low-rank matrices.
The small parameter $\epsilon$ controls the degree to which redundant species deviate from exact redundancy.
# Initialize the model
eq = RandomLotkaVolterra(
n_species=200, # Number of species
eps=1e-3 # Degree of functional redundancy
)
## Run a simulation and plot the results
ic = np.random.uniform(size=eq.n_species)
tmax = 1e8
t, y = eq.integrate(tmax, ic)
plt.semilogx(t, y);
plt.xlim(1e-3, None)
plt.xlabel("Time")
plt.ylabel("Species abundance")
Text(0, 0.5, 'Species abundance')
We next vary the degree of intratrophic vs intertrophic variation and measure the settling time.¶
tmax = 1e8
n_val = 200
ic = np.random.uniform(size=n_val)
settling_times = []
eps_vals = np.logspace(-5, 0, 10)
for eps in eps_vals:
print(f"eps: {eps}")
eq = RandomLotkaVolterra(
n_val,
random_state=0,
eps=eps,
sigma=2.0,
d=4.5,
kfrac=0.2,
connectivity=0.05
)
t, y = eq.integrate(tmax, ic)
tind = compute_settling_time(y)
settling_times.append(t[tind])
plt.figure(figsize=(5, 5))
plt.loglog(eps_vals, settling_times, '.')
plt.xlabel("Distance from Exact Intratrophic Interchangeability (Functional Redunancy) (epsilon)")
plt.ylabel("Settling Time")
eps: 1e-05 eps: 3.5938136638046256e-05 eps: 0.0001291549665014884 eps: 0.0004641588833612782 eps: 0.0016681005372000592 eps: 0.005994842503189409 eps: 0.021544346900318846 eps: 0.07742636826811278 eps: 0.2782559402207126 eps: 1.0
Text(0, 0.5, 'Settling Time')
The Lotka-Volterra model as analogue linear regression¶
Our ecosystem's dynamics are given by the generalized Lotka-Volterra equation,
$$ \frac{dN_i}{dt} = N_i \left( r_i + \sum_{j=1}^N A_{ij} N_j \right) $$
where $N_i$ is the population of species $i$, $r_i$ is the intrinsic growth rate of species $i$, and $A_{ij}$ is the interaction coefficient between species $i$ and $j$. The steady-state solutions of this equation is the solution to the linear system
$$ A \mathbf{N}^* = -\mathbf{r}, $$ with $N_i^* \geq 0$ for all $i$.
Without the outer term, the Lotka-Volterra model is a linear time-invariant system. The outer term adds a positivity constraint. In practice, this leads to sparsity (exclusion) in the steady-state solution $\mathbf{N}^*$.
Equilibria of ecological models as constrained optimization problems are studied in Mehta et al. 2019.
## Set the integration parameters
eq = RandomLotkaVolterra(
n_species=200, # Number of species
random_state=0, # Random seed
eps=1e0, # Degree of functional redundancy
sigma=0.5, # Standard deviation of interaction strengths
d=4.5, # Density limitation strength
kfrac=0.0, # Fraction of species that are redundant
connectivity=0.05 # Fraction of nonzero interactions
)
tmax = 1e8
ic = np.random.uniform(size=eq.n_species)
t, y = eq.integrate(tmax, ic, tol=1e-10)
plt.plot(y);
plt.xlim(1e-3, None)
plt.xlabel("Time")
plt.ylabel("Species abundance")
plt.show()
## Perform unconstrained linear regression
plt.figure(figsize=(8, 4))
plt.plot(np.linalg.solve(eq.A, -eq.r), y[-1], '.')
plt.xlabel("Linear regression solution")
plt.ylabel("Steady-state of Lotka Volterra model")
plt.show()
## Perform constrained linear regression
from scipy.optimize import nnls, lsq_linear
sol_nnls = lsq_linear(eq.A, -eq.r, bounds=(0, np.inf), tol=1e-10).x
plt.figure(figsize=(4, 4))
plt.plot(sol_nnls, y[-1], '.')
plt.xlabel("Constrained inear regression solution")
plt.ylabel("Steady-state of Lotka Volterra model")
from scipy.stats import spearmanr
corr = spearmanr(sol_nnls, y[-1])
plt.title(f"Correlation: {corr.correlation:.2f}")
Text(0.5, 1.0, 'Correlation: 0.83')
The condition number measures the difficulty of linear regression¶
For a linear regression problem of the form $A \mathbf{x} = \mathbf{b}$ with $A \in \mathbb{R}^{N \times N}$ and $\mathbf{b} \in \mathbb{R}^N$, the solution is given by $\mathbf{x} = A^{-1} \mathbf{b}$.
The condition number of $A$ is given by
$$ \kappa(A) \equiv \|A\|_2 \|A^{-1}\|_2 $$
This measures how sensitive the solution is to small changes in the input $\mathbf{b}$. Physically, the condition number measures how close the matrix is to being singular (non-invertible).
def condition_number(A):
return np.linalg.norm(A, 2) * np.linalg.norm(np.linalg.inv(A), 2)
A = np.random.normal(size=(10, 10))
A[:, 1] = A[:, 0] + 1e-9 # One column is nearly parallel to another
print(condition_number(A))
2.1265888039790778e+17
We can see how the condition number of $A$ affects the accuracy of linear regression by solving a linear regression where the matrix has a controllable degree of parallelism between two columns
$$ A = \begin{pmatrix} 1 & 1 \\ 1 & 1 + \epsilon \end{pmatrix} $$
by varying $\log(1/\epsilon)$, we increase the condition number of $A$.
We can test for the effect of ill-conditioning by randomly sampling problems of the form $A \mathbf{x} = \mathbf{b}$, and then solving for $\mathbf{x}$ using the built-in solver. $$ Err = \| {A}^{-1} \mathbf{b} - \mathbf{x} \| $$
import time
np.random.seed(0) # Set the random seed for reproducibility
all_errs, all_condition_numbers, all_solve_times = [], [], []
for eps in np.logspace(-5, 0, 100): # Sweep epsilon from 1e-5 to 1
## Generate a random matrix and a random target
a = np.random.normal(size=(100, 100))
a[1, 0] = a[0, 0] + eps # Set the condition number (how parallel the columns are)
x_true = np.random.normal(size=100) # True solution
b = a @ x_true # Compute the right-hand side
## Solve the linear system using a built-in solver
start_time = time.perf_counter()
x_pred = np.linalg.solve(a, b)
end_time = time.perf_counter()
err = np.linalg.norm(x_pred - x_true)
## Store the error, the condition number, and the solve time
all_errs.append(err)
all_condition_numbers.append(np.linalg.cond(a))
all_solve_times.append(end_time - start_time)
plt.figure(figsize=(4, 4))
plt.loglog(all_condition_numbers, all_errs, '.')
plt.xlabel("Condition number")
plt.ylabel("Error of inversion")
plt.show()
The condition number also effects the physical walltime of a linear solve¶
import time
sizes = np.arange(100, 2000, 10)
condition_numbers = []
times = []
for n in sizes:
# Generate a random matrix and a random vector
A = np.random.rand(n, n)
b = np.random.rand(n)
# Compute the condition number
cond_num = np.linalg.cond(A)
# Measure the time to solve the system using a built-in linear solver
start_time = time.time()
np.linalg.solve(A, b)
end_time = time.time()
# Store the time taken and condition number
condition_numbers.append(cond_num)
times.append(end_time - start_time)
plt.figure()
plt.loglog(condition_numbers, times, '.', markersize=10)
plt.xlabel('Condition Number')
plt.ylabel('Time Taken (s)')
plt.title('Condition Number vs Time Taken')
plt.show()
The condition number generically measures computational hardness¶
A constrained optimization problems have a condition number, not just linear regression. It generically measures the distance to the singular (unsolvable) case. See Renegar (1995)
Unlike $\epsilon$, which is a property of our particular model for generating interaction matrices $A$, the condition number is a property we can measure from any interaction $A$.
Two nearly-parallel rows of $A$ is equivalent to two species being nearly-redundant (interchangeable in the community). Functional redundancy is a form of ill-conditioning.
tmax = 1e8
n_val = 200
ic = np.random.uniform(size=n_val)
settling_times = []
eps_vals = np.logspace(-5, 0, 10) # Vary the distance to exact functional redundancy
condition_numbers = []
for eps in eps_vals:
print(f"eps: {eps}")
eq = RandomLotkaVolterra(
n_species=n_val, # Number of species
random_state=0, # Random seed
eps=eps, # Distance to exact functional redundancy
sigma=2.0, # Standard deviation of interaction strengths
d=4.5, # Density limitation strength
kfrac=0.2, # Fraction of species that are redundant
connectivity=0.05 # Fraction of nonzero interactions
)
t, y = eq.integrate(tmax, ic)
tind = compute_settling_time(y)
settling_times.append(t[tind])
condition_numbers.append(np.linalg.cond(eq.A))
# plt.figure(figsize=(5, 5))
# plt.loglog(eps_vals, settling_times, '.')
# plt.xlabel("Distance to Exact Functional Redunancy")
# plt.ylabel("Settling Time")
# plt.figure(figsize=(5, 5))
# plt.loglog(eps_vals, condition_numbers, '.k')
# plt.xlabel("Distance to Exact Functional Redunancy")
# plt.ylabel("Condition Number")
plt.figure(figsize=(5, 5))
plt.loglog(condition_numbers, settling_times, '.')
plt.xlabel("Condition Number")
plt.ylabel("Settling Time")
eps: 1e-05 eps: 3.5938136638046256e-05 eps: 0.0001291549665014884 eps: 0.0004641588833612782 eps: 0.0016681005372000592 eps: 0.005994842503189409 eps: 0.021544346900318846 eps: 0.07742636826811278 eps: 0.2782559402207126 eps: 1.0
Text(0, 0.5, 'Settling Time')
We therefore see that there is a relationship between a system parameter (condition number) and the time to reach a steady-state.
Krylov bound on linear solvers¶
In numerical analysis, for iterative first-order solvers for linear systems, the expected convergence time is given by
$$ \tau \propto \log\bigg( \frac{\kappa(A) - 1}{\kappa(A) + 1} \bigg)^{-1} $$
## Loads a figure from the resources folder
Image("./resources/fig_overview.jpg", width=1200)
Supertransients tend to arise in hard optimization problems¶
Ercsey-Ravasz & Toroczkai (2011): Constraint load leads to slowdown, transient chaos, in continuous-time KSAT solvers
display(Image("./resources/ksat.png", width=600))
Dimensionality reduction as preconditioning¶
Often we want to use PCA, etc to project high-dimensional time series into a low-dimensional space. In ecological time series, these reduced-order variables are called ecomodes, and they correspond to combinations of species that co-vary.
Our original Lotka-Volterra dynamics unfold in a full $N$-dimensional space
$$ \dot{\mathbf{N}} = \mathbf{f}(\mathbf{N}) $$
We can instead study the projected by finding a projection operator $P \in \mathbb{R}^{N \times k}$, $k \ll N$, that projects the dynamics into a lower-dimensional space of dimension $k$
$$ \mathbf{x} \equiv P \mathbf{N}, \quad \dot{\mathbf{x}} = P \dot{\mathbf{N}} $$
where $\mathbf{x}(t) \in \mathbb{R}^k$ is a lower-dimensional state vector.
## Start by simulating a full-dimensional state vector
n_val = 200
eq = RandomLotkaVolterra(n_species=100, random_state=0)
ic = np.random.uniform(size=eq.n)
t, y = eq.integrate(tmax, ic)
plt.figure()
plt.semilogx(t, y);
plt.xlabel("Time")
plt.ylabel("Species abundance")
plt.xlim(1e-2, None)
(0.01, np.float64(794328234.7242821))
How to choose a good projection $P$?¶
- Since ill-conditioning leads to timescale separation, we seek a projection $P$ satisfying the following properties:
$$ \kappa(P A) \ll \kappa(A) $$
In numerical analysis, this is known as a preconditioner. If we have a stiff or ill-conditioned linear system, we can precondition it with an appropriate choice of $P$.
In PDEs, spectral methods can be thought of as a preconditioner. Some problems are ill-conditioned in the physical space, well-conditioned in the Fourier space.
print(f"The condition number of the full system is {np.linalg.cond(eq.A)}")
The condition number of the full system is 4429801842.761199
Singular value decomposition¶
- Every matrix $A$ can be decomposed into a product of three matrices:
$$ A = U \Sigma V^\top $$
where $U \in \mathbb{R}^{N \times N}$ and $V \in \mathbb{R}^{N \times N}$ are orthogonal matrices (rotation matrices), and $\Sigma \in \mathbb{R}^{N \times N}$ is a diagonal matrix of positive singular values (in decreasing order).
We view $U$ as a rotation matrix that rotates our problem into more favorable coordinates.
U, S, V = np.linalg.svd(eq.A)
plt.figure(figsize=(5, 5))
plt.plot(S);
plt.xlabel("Rank")
plt.ylabel("Singular value")
plt.title("Singular values of the interaction matrix")
plt.semilogy();
How do we interpret the orthogonal matrices $U$ and $V$?¶
The right singular vectors $V_i$ isolate modes with dynamical timescales $\tau_i \sim 1 / \sigma_i$.
These will give the same results as PCA ecomodes only if (1) the dynamics are an Ornstein-Uhlenbeck process, and (2) all interactions are symmetric (no predator-prey terms)
How general is SVD for isolating fast and slow modes? Check out Weyl's inequalities as discussed in Thibeault et al. 2024.
## Loads a figure from the resources folder
Image("./resources/fig_svd.jpg", width=1000)
U, S, V = np.linalg.svd(eq.A)
P_fast, P_slow = V[:10, :], V[-10:, :]
yp_fast = P_fast @ y.T # First k modes
yp_slow = P_slow @ y.T # Last k modes
plt.figure()
plt.semilogx(t, y);
plt.xlabel("Time")
plt.ylabel("Species amplitude")
plt.xlim(1e-2, None)
plt.figure()
plt.semilogx(t, yp_fast.T);
plt.xlabel("Time")
plt.ylabel("Fast mode amplitude")
plt.xlim(1e-2, None)
plt.figure()
plt.semilogx(t, yp_slow.T);
plt.xlabel("Time")
plt.ylabel("Slow mode amplitude")
plt.xlim(1e-2, None)
plt.xlabel("Time")
plt.ylabel("Slow mode amplitude")
plt.xlim(1e-2, None)
(0.01, np.float64(794328234.7242821))
Preconditioning: We can compute the condition number of the dynamics projected into the subspaces spanned by the fast and slow modes.
$$ A_\text{slow} = P_\text{slow}^\top A P_\text{slow} $$
For a linear dynamical system, this would also correspond to the Jacobian of the projected system.
print("Original condition number: ", np.linalg.cond(eq.A))
print("Fast modes condition number: ", np.linalg.cond(P_fast @ eq.A @ P_fast.T))
print("Slow modes condition number: ", np.linalg.cond(P_slow @ eq.A @ P_slow.T))
Original condition number: 4429801842.761199 Fast modes condition number: 1.5851693011189856 Slow modes condition number: 153.51209647795702
Why does this work? The condition number alternatively can be expressed as the ratio of the largest and smallest singular values of the matrix.
$$ \kappa(A) = \frac{\sigma_{\text{max}}(A)}{\sigma_{\text{min}}(A)} $$
When would ill-conditioning emerge in a random ecosystem?¶
- We revisit the case where the dynamics are not ill-conditioned, the Gaussian Lotka-Volterra model.
eq = GaussianLotkaVolterra(
n_species=100,
d=8.5,
random_state=0,
)
tmax = 1000
ic = np.random.uniform(size=eq.n)
t, y = eq.integrate(tmax, ic)
plt.figure()
plt.semilogx(t, y);
plt.xlabel("Time")
plt.ylabel("Species abundance")
nonzero_remaining = compute_number_survive(y[-1])
plt.title(f"Number of surviving species: {nonzero_remaining} / {eq.n_species}")
plt.xlim(1e-2, t[-1])
(0.01, np.float64(179.5891149901866))
Evolving an ecosystem towards higher diversity¶
We now perform a simple evolutionary algorithm to evolve the ecosystem to have more surviving species.
We start with a random interaction matrix, simulate the ecosystem, and find the surviving species.
If a species is excluded from the ecosystem, we replace it in the next generation with a new species with interactions averaged from two randomly chosen surviving species.
If $i$ is the index of the extinct species, $j$ and $k$ are the indices of the two surviving species, and $a_{ij}$ is the interaction between species $i$ and $j$, we update the interaction matrix as
$$ a_{i,:} = 0.5 a_{j,:} + 0.5 a_{k,:} $$
We repeat this process for 100 generations. Our process resembles a Moran process with infinite timescale separation between evolutionary and ecological dynamics.
all_fitnesses = []
all_condition_numbers = []
for j in range(100):
## Simulate the ecosystem with the current interaction matrix
t, y = eq.integrate(tmax, ic)
sol = y[-1]
## Find the surviving and extinct species
where_survive = (sol > 1 / eq.n)
surviving_inds = np.where(where_survive)[0]
extinct_inds = np.setdiff1d(np.arange(eq.n), surviving_inds)
## Recombination: replace interaction rows of extinct species with average over interactions from surviving species
new_matrix = eq.A.copy()
for i in extinct_inds:
parent_ind1, parent_ind2 = np.random.choice(surviving_inds, size=2, replace=False)
interaction_ind = np.random.choice(np.setdiff1d(np.arange(eq.n), [parent_ind1, parent_ind2, i]))
new_matrix[i] = eq.A[i]
new_matrix[i, interaction_ind] = 0.5 * eq.A[parent_ind1, interaction_ind] + 0.5 * eq.A[parent_ind2, interaction_ind]
## Update the interaction matrix
eq.A = new_matrix.copy()
fitness = np.sum(where_survive)
print(f"Iteration {j}: The number of surviving species is {fitness} out of {eq.n}")
## Save the fitness and condition number for this generation
all_fitnesses.append(np.sum(where_survive))
all_condition_numbers.append(np.linalg.cond(eq.A[surviving_inds][:, surviving_inds]))
Iteration 0: The number of surviving species is 49 out of 100 Iteration 1: The number of surviving species is 50 out of 100 Iteration 2: The number of surviving species is 52 out of 100 Iteration 3: The number of surviving species is 53 out of 100 Iteration 4: The number of surviving species is 53 out of 100 Iteration 5: The number of surviving species is 54 out of 100 Iteration 6: The number of surviving species is 55 out of 100 Iteration 7: The number of surviving species is 55 out of 100 Iteration 8: The number of surviving species is 59 out of 100 Iteration 9: The number of surviving species is 59 out of 100 Iteration 10: The number of surviving species is 57 out of 100 Iteration 11: The number of surviving species is 56 out of 100 Iteration 12: The number of surviving species is 58 out of 100 Iteration 13: The number of surviving species is 59 out of 100 Iteration 14: The number of surviving species is 57 out of 100 Iteration 15: The number of surviving species is 59 out of 100 Iteration 16: The number of surviving species is 60 out of 100 Iteration 17: The number of surviving species is 60 out of 100 Iteration 18: The number of surviving species is 60 out of 100 Iteration 19: The number of surviving species is 60 out of 100 Iteration 20: The number of surviving species is 60 out of 100 Iteration 21: The number of surviving species is 60 out of 100 Iteration 22: The number of surviving species is 61 out of 100 Iteration 23: The number of surviving species is 61 out of 100 Iteration 24: The number of surviving species is 61 out of 100 Iteration 25: The number of surviving species is 61 out of 100 Iteration 26: The number of surviving species is 62 out of 100 Iteration 27: The number of surviving species is 62 out of 100 Iteration 28: The number of surviving species is 62 out of 100 Iteration 29: The number of surviving species is 62 out of 100 Iteration 30: The number of surviving species is 62 out of 100 Iteration 31: The number of surviving species is 62 out of 100 Iteration 32: The number of surviving species is 58 out of 100 Iteration 33: The number of surviving species is 58 out of 100 Iteration 34: The number of surviving species is 58 out of 100 Iteration 35: The number of surviving species is 63 out of 100 Iteration 36: The number of surviving species is 64 out of 100 Iteration 37: The number of surviving species is 68 out of 100 Iteration 38: The number of surviving species is 68 out of 100 Iteration 39: The number of surviving species is 68 out of 100 Iteration 40: The number of surviving species is 68 out of 100 Iteration 41: The number of surviving species is 69 out of 100 Iteration 42: The number of surviving species is 70 out of 100 Iteration 43: The number of surviving species is 70 out of 100 Iteration 44: The number of surviving species is 70 out of 100 Iteration 45: The number of surviving species is 70 out of 100 Iteration 46: The number of surviving species is 70 out of 100 Iteration 47: The number of surviving species is 58 out of 100 Iteration 48: The number of surviving species is 64 out of 100 Iteration 49: The number of surviving species is 63 out of 100 Iteration 50: The number of surviving species is 69 out of 100 Iteration 51: The number of surviving species is 69 out of 100 Iteration 52: The number of surviving species is 67 out of 100 Iteration 53: The number of surviving species is 66 out of 100 Iteration 54: The number of surviving species is 67 out of 100 Iteration 55: The number of surviving species is 70 out of 100 Iteration 56: The number of surviving species is 70 out of 100 Iteration 57: The number of surviving species is 70 out of 100 Iteration 58: The number of surviving species is 70 out of 100 Iteration 59: The number of surviving species is 71 out of 100 Iteration 60: The number of surviving species is 71 out of 100 Iteration 61: The number of surviving species is 71 out of 100 Iteration 62: The number of surviving species is 72 out of 100 Iteration 63: The number of surviving species is 72 out of 100 Iteration 64: The number of surviving species is 72 out of 100 Iteration 65: The number of surviving species is 72 out of 100 Iteration 66: The number of surviving species is 73 out of 100 Iteration 67: The number of surviving species is 74 out of 100 Iteration 68: The number of surviving species is 75 out of 100 Iteration 69: The number of surviving species is 75 out of 100 Iteration 70: The number of surviving species is 75 out of 100 Iteration 71: The number of surviving species is 76 out of 100 Iteration 72: The number of surviving species is 77 out of 100 Iteration 73: The number of surviving species is 78 out of 100 Iteration 74: The number of surviving species is 79 out of 100 Iteration 75: The number of surviving species is 80 out of 100 Iteration 76: The number of surviving species is 80 out of 100 Iteration 77: The number of surviving species is 81 out of 100 Iteration 78: The number of surviving species is 81 out of 100 Iteration 79: The number of surviving species is 81 out of 100 Iteration 80: The number of surviving species is 81 out of 100 Iteration 81: The number of surviving species is 80 out of 100 Iteration 82: The number of surviving species is 80 out of 100 Iteration 83: The number of surviving species is 81 out of 100 Iteration 84: The number of surviving species is 81 out of 100 Iteration 85: The number of surviving species is 81 out of 100 Iteration 86: The number of surviving species is 81 out of 100 Iteration 87: The number of surviving species is 81 out of 100 Iteration 88: The number of surviving species is 81 out of 100 Iteration 89: The number of surviving species is 82 out of 100 Iteration 90: The number of surviving species is 82 out of 100 Iteration 91: The number of surviving species is 84 out of 100 Iteration 92: The number of surviving species is 85 out of 100 Iteration 93: The number of surviving species is 85 out of 100 Iteration 94: The number of surviving species is 85 out of 100 Iteration 95: The number of surviving species is 85 out of 100 Iteration 96: The number of surviving species is 85 out of 100 Iteration 97: The number of surviving species is 85 out of 100 Iteration 98: The number of surviving species is 85 out of 100 Iteration 99: The number of surviving species is 85 out of 100
plt.plot(all_fitnesses);
plt.xlabel("Generation")
plt.ylabel("Number of surviving species")
Text(0, 0.5, 'Number of surviving species')
plt.semilogy(all_condition_numbers)
plt.xlabel("Generation")
plt.ylabel("Condition number")
Text(0, 0.5, 'Condition number')
We find that the condition number tends to increase as the diversity increases
This is because the space of allowed $A$ matrices becomes more constrained
Chen & Dongarra (2008) give an expected scaling for $\kappa(A)$ with the active constraints on the non-negative least squares problem.
## Loads a figure from the resources folder
Image("./resources/fig_evolve.jpg", width=1200)
Appendix¶
The Fast Lyapunov Indicator meausures excitability¶
For each trajectory $\mathbf{N}(t)$ governed by the Lotka-Volterra equation, we integrate the variational equation corresponding to the Jacobian matrix $\dot{\mathbf{w}}(t) = \mathbf{J}[\mathbf{N}(t)] \mathbf{w}(t)$, with initial conditions $\mathbf{w}(0) = I \in \mathbb{R}^{N \times N}$.
The quantity $$ \lambda_{F} = \max_{t} \log\| \mathbf{w}(t) \|_2 $$ is called the Fast Lyapunov Indicator (FLI).
## Loads a figure from the resources folder
Image("./resources/fig_chaos.jpg", width=1200)